Skip to content

[mlir][xegpu] Add definition of SliceAttr #150146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Aug 8, 2025
Merged

Conversation

chencha3
Copy link
Contributor

@chencha3 chencha3 commented Jul 22, 2025

This PR introduces the definition of SliceAttr, which complements the existing LayoutAttr by providing a mechanism to describe data distribution for operations such as multi_reduction and broadcast. For potential convenience of lowering pass, A common interface (LayoutTrait) is created for LayoutAttr and SliceAttr to provide an util to compute offsets for a given subgroup (id), and workgroup-level problem size (shape). the WgToSgCreateNdOp is updated to test the implementation of such interface for LayoutAttr, and a test pass --test-xegpu-layout-interface is introduced to test its implementation for SliceAttr.
Additional use cases and lowering support for SliceAttr will be addressed in subsequent PRs.

Copy link

github-actions bot commented Jul 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@chencha3 chencha3 marked this pull request as ready for review July 25, 2025 17:28
/// drops the data in the specified dimension, and return the rest. e.g.,
/// for data = [32, 64, 8], dropPositions = [0, 2], it will return [64]
template<typename T, typename U>
static llvm::SmallVector<T> dropDims(llvm::ArrayRef<T> data, llvm::ArrayRef<U> dropPositions) {
Copy link
Contributor

@Jianhui-Li Jianhui-Li Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : dropDims -> sliceDims? data->dims, dropPosition -> dropDims

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to static llvm::SmallVector<T> slice(llvm::ArrayRef<T> shape, llvm::ArrayRef<U> dims). is it good to you?

Value add = rewriter.createOrFold<index::AddOp>(
loc, tdescOffsets[i],
getValueOrCreateConstantIndexOp(rewriter, loc, oldOffsets[idx]));
newOffsets.push_back(add);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: oldOffsets: wgTileOffsets
newOffsets: sgTileOffsets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion. Fixed.

}];

let assemblyFormat = "`<` struct(params) `>`";
let genVerifyDecl = 1;
}


def XeGPU_SliceAttr : XeGPUAttr<"Slice", "slice", [LayoutTrait]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is desirable to allow nested slice attribute to match the staged reduction use case, where reduction may follow another reduction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the support for nested SliceAttr has been enabled. @Jianhui-Li

@@ -50,4 +50,21 @@ gpu.func @convert_layout_wg(%a: vector<32x64xf16>) {
gpu.return
}

gpu.func @slice_attr_repeat_dim() {
//CHECK: arith.constant {layout_result_0 = #xegpu.slice<<sg_layout = [16, 1, 1], sg_data = [1, 8, 2]>, dims = [2]>} dense<8> : vector<16x8xindex>
%cst = arith.constant {layout_result_0 = #xegpu.slice<<sg_layout = [16, 1, 1], sg_data = [1, 8, 2]>, dims = [2]>} dense<8> : vector<16x8xindex>
Copy link
Contributor

@nbpatel nbpatel Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be #xegpu.slice<<#xegpu.layout<sg_layout = [16, 1, 1], sg_data = [1, 8, 2]>>, dims = [2]>}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the print format issue. it is fixed.

@@ -7,6 +7,7 @@ add_mlir_dialect_library(MLIRXeGPUDialect

DEPENDS
MLIRXeGPUIncGen
MLIRXeGPUAttrInterfaceIncGen
MLIRXeGPUAttrsIncGen
MLIRXeGPUEnumsIncGen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont you need to link MLIRIndexDialect ,MLIRAffineUtils here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes, they are needed.

@charithaintc charithaintc self-requested a review August 5, 2025 18:38
Copy link
Contributor

@charithaintc charithaintc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments.

"delinearizeSubgroupId",
(ins "OpBuilder &": $builder, "Location":$loc, "Value":$linearId)>,
InterfaceMethod<"Get the local offset to be accessed by the given subgroup Id",
"FailureOr<SmallVector<SmallVector<Value>>>",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the need for having a vector<vector<>> here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since each subgroup may be assigned with multiple blocks.

#r = #xegpu.slice<#l, dim = 0>

%exp = math.exp %input {layout_result_0 = #l}: vector<256x128xf32>
%red = vector.multi_reduction<add>, %exp, %acc [0] {layout_result_0 = #r}: vector<256x128xf32> to vector<128xf32>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to add a comment here explaning the output layout of %red.

here the output sg layout is still [8, 4]. however data is shared along dim 0. So effectively there are 8 slices of [1, 4] SG segments each owning [1, 32] data. For example SGs [0-7][0] owns the same 1x32 data segement. 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it

/// drops/slices the shape in the specified dims, and return the rest. e.g.,
/// for shape = [32, 64, 8], dims = [0, 2], it will return [64]
template<typename T, typename U>
static llvm::SmallVector<T> slice(llvm::ArrayRef<T> shape, llvm::ArrayRef<U> dims) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this unrelated to XeGPUDialect. any reason for placing here? can it be moved to Utils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it to Utils will incur recursive dependence for linker.

@@ -33,6 +37,54 @@ void XeGPUDialect::initialize() {
>();
}

// generate offsets computing instructions for a subgroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use triple line comments for def/impl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if (!isWgLayout())
return failure();

// TODO: handle order attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe assert unavailability of Order attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

FailureOr<SmallVector<SmallVector<Value>>>
LayoutAttr::getOffsets(OpBuilder &builder, Location loc, Value linearId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add a comment on what the purpose of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// check every element in dims is unique and smaller than rank
llvm::SmallDenseSet<int64_t> seen;
for (int64_t dim : dims.asArrayRef()) {
if (dim >= rank)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we check if dim >= 0 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Fixed.

#define DBGS() (llvm::dbgs() << "[" DEBUG_TYPE "]: ")
#define LDBG(X) LLVM_DEBUG(DBGS() << X << "\n")

class TestStepOpPattern : public OpConversionPattern<vector::StepOp> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment on what this pattern is doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed.

@chencha3 chencha3 requested review from Jianhui-Li and nbpatel August 6, 2025 17:46
Copy link
Contributor

@nbpatel nbpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

mlir_tablegen(XeGPUAttrInterface.h.inc -gen-attr-interface-decls)
mlir_tablegen(XeGPUAttrInterface.cpp.inc -gen-attr-interface-defs)
add_public_tablegen_target(MLIRXeGPUAttrInterfaceIncGen)
add_dependencies(mlir-headers MLIRXeGPUAttrInterfaceIncGen)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


auto newCreateNdOp = xegpu::CreateNdDescOp::create(
rewriter, loc, newTdescTy, op.getSource(), globalOffsets,
SmallVector<OpFoldResult> wgTileOffsets = op.getMixedOffsets();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe better to not use tile and just use wgOffsets and sgOffsets as var names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}];

let parameters = (ins
"xegpu::LayoutTrait": $parent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is parent a trait and not LayoutAttr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to support using both LayoutAttr and SliceAttr as parent. The later one is nested definition.

static SmallVector<SmallVector<Value>>
genOffsetsComputations(OpBuilder &builder, Location loc,
SmallVector<Value> sgId, ArrayRef<int64_t> sgLayout,
ArrayRef<int64_t> sgShape, ArrayRef<int64_t> shape) {
Copy link
Contributor

@Jianhui-Li Jianhui-Li Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider generalizing sgShape to sgDataStride.
From the caller size, need to set as following :
sgDataStride[i] = sgData[i] if (wgData[i] % (sgData[i]*sgLayout[i])) == 0
sgDataStride[i] = 0 if wgData[i] == sgData[i]

if (auto maybeSgShape = getSgDataAsInt())
sgShape = maybeSgShape.value();
else if (auto ratio = computeShapeRatio(shape, sgLayout))
sgShape = ratio.value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape == ratio?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the variable name to derivedShape for clarification per discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you change the last place, but missed this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

//CHECK: [[C128:%.+]] = arith.constant 128 : index
//CHECK: [[X:%.+]] = index.remu [[UX]], [[C128]]
//CHECK: [[TDESC:%.+]] = xegpu.create_nd_tdesc [[ARG_0]][[[Y]], [[X]]] : memref<256x128xf32> -> !xegpu.tensor_desc<32x32xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>>
%tdesc = xegpu.create_nd_tdesc %src[0, 0] : memref<256x128xf32>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding a test:

%tdesc = xegpu.create_nd_tdesc %src[0, 0] : memref<256x32xf32>
-> !xegpu.tensor_desc<256x32xf32, #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 32], lane_layout = [1, 16], lane_data = [1, 1]>>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added one.

Copy link
Contributor

@Jianhui-Li Jianhui-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chencha3 chencha3 merged commit c962234 into llvm:main Aug 8, 2025
9 checks passed
rupprecht added a commit to rupprecht/llvm-project that referenced this pull request Aug 8, 2025
rupprecht added a commit that referenced this pull request Aug 8, 2025
Garra1980 added a commit to intel/mlir-extensions that referenced this pull request Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants